51 research outputs found
Local Statistical Modeling via Cluster-Weighted Approach with Elliptical Distributions
Cluster Weighted Modeling (CWM) is a mixture approach regarding the modelisation of the joint probability of data coming from a heterogeneous population. Under Gaussian assumptions, we investigate statistical properties of CWM from both the theoretical and numerical point of view; in particular, we show that CWM includes as special cases mixtures of distributions and mixtures of regressions. Further, we introduce CWM based on Student-t distributions providing more robust fitting for groups of observations with longer than normal tails or atypical observations. Theoretical results are illustrated using some empirical studies, considering both real and simulated data.Cluster-Weighted Modeling, Mixture Models, Model-Based Clustering
flexCWM: A Flexible Framework for Cluster-Weighted Models
Cluster-weighted models (CWMs) are mixtures of regression models with random covariates. However, besides having recently become rather popular in statistics and data mining, there is still a lack of support for CWMs within the most popular statistical suites. In this paper, we introduce flexCWM, an R package specifically conceived for fitting CWMs. The package supports modeling the conditioned response variable by means of the most common distributions of the exponential family and by the t distribution. Covariates are allowed to be of mixed-type and parsimonious modeling of multivariate normal covariates, based on the eigenvalue decomposition of the component covariance matrices, is supported. Furthermore, either the response or the covariates distributions can be omitted, yielding to mixtures of distributions and mixtures of regression models with fixed covariates, respectively. The expectation-maximization (EM) algorithm is used to obtain maximum-likelihood estimates of the parameters and likelihood-based information criteria are adopted to select the number of groups and/or a parsimonious model. For the component regression coefficients, standard errors and significance tests are also provided. Parallel computation can be used on multicore PCs and computer clusters, when several models have to be fitted. To exemplify the use of the package, applications to artificial and real datasets, included in the package, are presented
The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.
Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous
population, offering – at the same time – dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.Ministerio de Economía y Competitividad and FEDER, grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, grant VA212U13, by grant FAR 2015 from the University of Milano-Bicocca and by grant FIR 2014 from the University of Catania
Robust estimation for mixtures of Gaussian factor analyzers, based on trimming and constraints
Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved
heterogeneous population, offering - at the same time - dimension reduction
and model-based clustering. Unfortunately, the high prevalence of spurious
solutions and the disturbing effects of outlying observations, along maximum likelihood
estimation, open serious issues. In this paper we consider restrictions for
the component covariances, to avoid spurious solutions, and trimming, to provide
robustness against violations of normality assumptions of the underlying latent factors.
A detailed AECM algorithm for this new approach is presented. Simulation
results and an application to the AIS dataset show the aim and effectiveness of the
proposed methodology
Robust estimation of mixtures of regressions with random covariates, via trimming and constraints
Producción CientíficaA robust estimator for a wide family of mixtures of linear regression is presented.
Robustness is based on the joint adoption of the Cluster Weighted Model and
of an estimator based on trimming and restrictions. The selected model provides the
conditional distribution of the response for each group, as in mixtures of regression,
and further supplies local distributions for the explanatory variables. A novel version
of the restrictions has been devised, under this model, for separately controlling the
two sources of variability identified in it. This proposal avoids singularities in the
log-likelihood, caused by approximate local collinearity in the explanatory variables
or local exact fits in regressions, and reduces the occurrence of spurious local maximizers.
In a natural way, due to the interaction between the model and the estimator,
the procedure is able to resist the harmful influence of bad leverage points along the
estimation of the mixture of regressions, which is still an open issue in the literature.
The given methodology defines a well-posed statistical problem, whose estimator exists
and is consistent to the corresponding solution of the population optimum, under
widely general conditions. A feasible EM algorithm has also been provided to obtain
the corresponding estimation. Many simulated examples and two real datasets have
been chosen to show the ability of the procedure, on the one hand, to detect anomalous
data, and, on the other hand, to identify the real cluster regressions without the
influence of contamination.
Keywords Cluster Weighted Modeling · Mixture of Regressions · Robustnes
The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers.
Producción CientíficaMixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous
population, offering – at the same time – dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.Ministerio de Economía y Competitividad and FEDER, grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, grant VA212U13, by grant FAR 2015 from the University of Milano-Bicocca and by grant FIR 2014 from the University of Catania
- …